Data processing system

ABSTRACT

A data processing system is provided in which a data acquisition unit includes a data tag generator for generating data tags associated with acquired data items. The generated data tags are transmitted to a data store.

BACKGROUND OF THE INVENTION

[0001] It is well known to provide data bases for the storage of individual data items. The data items may, for example, be individual photographs or other images, text documents, items of music or other audio information, personnel records, or any other such data items. The purpose of such data bases is to provide both storage for the data items and also to provide the facility to search through the stored data items in accordance with one or more criteria.

[0002] To allow such searching to be executed, the individual data items are indexed in some manner. Early systems of indexing were relatively simple and included such schemes as grouping a number of data items together within a single category, such that when a search was performed for items within that category the relevant data items could be retrieved.

[0003] However, in time, more advanced indexing schemes have been developed, with one such scheme involving the use of metadata. Metadata describes data, that is, it is data about data. The metadata associated with a data item may be as simple as a set of natural language comments referring to one or more elements of the data item. Alternatively, the metadata may equally be much more complex comments, or tags, that may refer to the structure of the data. For example, the metadata associated with a text document may include a reference to the subject matter of the document, its author, the number of words, the size of the data item etc. As a further example, metadata associated with a graphical image such as a photograph may include tags identifying different elements of the image i.e. that the image is a sunrise/sunset, it includes peoples faces, it is a landscape, and so on.

[0004] The metadata that is generated for each data item stored within a database is ordinarily much more compact than the raw data itself. An analogy would be the use of classification cards in a public library. Each card represents a book stored in the library and contains certain items of information about the book, for example author and title. The cards themselves occupy a relevantly small amount of space compared to the space occupied by the complete library and by searching the cards a set of books, for example by the same author, can be identified.

[0005] However, the metadata may in some circumstances be much more sizeable than the original data item. Maintaining the library analogy, a single novel, such as ‘Pride and Prejudice’, may have a number of books analysing it associated with it. These volumes of analysis are analogous to the metadata to the original novel, yet would take up more shelf space than the novel itself.

[0006] Powerful search tools that exploit metadata are used to augment conventional search tools.

[0007] The increased functionality of databases and their associated search engines is one of the factors in their increased usage. Another factor is the increased usage of networked systems with local or remote network stations being linked to a central database. The link may be provided by a dedicated transmission cable or a shared transmission cable or wireless connection, or other such connection means. The quality and/or speed of the connection may pose a serious restriction on the amount of data that can be transmitted between the central database and the remote stations. It is therefore a problem to provide a large database with powerful search facilities, especially when dealing with non-textual data, that is easily and quickly accessible from remote stations. It may be equally problematic for data items to be exchanged between the database and the remote stations.

[0008] A further problem is the cost in processing terms required to generate the metadata. A large centralised database may require a prohibitive amount of processing power to deal with the generation of increasing amounts of metadata. Equally, local computers, such as domestic PC's, or portable devices such as personal digital assistants or cameras may not ordinarily have the processing power available to perform the metadata generation under all circumstances or in acceptable time frames.

[0009] At least some of the above problems apply to locally held databases. The consideration then is whether to perform the metadata generation locally or request an additional remote facility to do it even though there is no requirement or intention to transmit either the generated metadata or data item to a centralised database.

SUMMARY OF THE INVENTION

[0010] According to the present invention there is provided a data processing system comprising at least one data item acquisition unit and at least one data store, the data item acquisition unit including a data tag generator for generating a data tag associated with each data item, and the data item acquisition unit being arranged to transmit at least the data tag to the data store.

[0011] The data item acquisition unit may also transmit the data item itself to the at least one data store.

[0012] Additionally, the at least one data store may also include a data tag generator.

[0013] Each data store may be connected to one or more other data stores so as to provide a hierarchical arrangement of data stores.

[0014] The data item acquisition units may include a decision module that evaluates if the data tag should be generated at the data acquisition unit or if the data item should be sent to the data store and the data tag generated there. The evaluation may take into consideration the size and complexity of the data item and hence the processing power required to generate the data tag, and may also include an evaluation of the efficiency of transmitting the data item to the data store for data processing based on the size of the data item and the quality and/or speed of the connection between data acquisition unit and data store. The evaluation may also take into consideration any previously stipulated privacy requirements.

[0015] A use of the metadata is to preserve the privacy of the original data item. If the data item includes elements that it is desired to keep secret, only the metadata associated with the remaining elements need be transmitted to the data store. In a similar manner, a search query may be transmitted to the data store with only the metadata essential for the search without transmitting the original data item itself.

[0016] The data tag generator at the data acquisition unit is preferably arranged to detect a failure to generate a data tag for a data item. A failure to generate the metadata may occur due to one or more of a number of reasons. It is normal practice not to limit the amount of system memory required during metadata generation. There is therefore the possibility for a failure to occur due to the data acquisition unit running out of memory. Equally, the processing power required to generate the metadata may be greater than that available at the data acquisition unit at that time. A further cause of failure may be that the data acquisition unit is not appropriately configured with the relevant contextual data for the data item being presented.

[0017] The failure may be a ‘hard’ failure, in which case no metadata is generated and the data acquisition unit may transmit the data item to the data store, the data tag (metadata) generation then occurring at the data store. Alternatively, appropriate configuration information held by the data tag generator at the data store may be transmitted to the data acquisition unit to allow successful data tag generation to occur at the data acquisition unit. The failure may alternatively be a ‘soft’ failure, in which case the metadata generated prior to the failure occurring may be transmitted to the data store, or equally simplified metadata may be generated instead and transmitted to the data store.

[0018] Alternatively or additionally, the data acquisition unit may be arranged such that new data tag configuration information may be defined in relation to a presented data item and the new data tag configuration information transmitted to the data store for inclusion in the data tag generator located therein.

[0019] According to a second aspect of the present invention there is provided a method of processing data, the method comprising generating a data tag associated with a data item, said data tag generation occurring at a data acquisition unit, and transmitting at least said data tag to a data store.

[0020] Advantageously the data item may also be transmitted to the data store. Data tag generation may also occur at the data store.

[0021] The method may further include evaluating if it is appropriate to generate the data tags at the or each acquisition unit or to transmit the data item to a data store for data tag generation to occur there. The evaluation procedure may include determining the size and complexity of the data item to be processed, providing an estimate of the processing power required to generate the associated data tag in response to the determination, comparing the estimated processing power with the processing power of the available data tag generator, and in response to the comparison either generating the data tag locally or transmitting the data item to a data store.

[0022] The method may further comprise transmitting configuration information from a data store to a data acquisition device and configuring a data tag generator located at the data acquisition device in accordance with the configuration information.

[0023] Additionally or alternatively, data tag generator configuration information may be transmitted from a data item acquisition device to a data store, said configuration information being defined in relation to a user presented data item.

[0024] According to a third aspect of the present invention there is provided a data item acquisition device comprising a data tag generator and being arranged to transmit a generated data tag associated with an acquired data item to a data store.

[0025] The data item acquisition may device further comprise an evaluation module that is arranged to determine the processing power required to generate a data tag for a data item, compare the required processing power to the processing power available at the data tag generator, and in response to the comparison either enable the data tag generator to generate the data tag or enable the transmission of the data item to a data store.

[0026] The data item acquisition device may form an integral or peripheral part of a personal computer. Additionally one or more data capture devices, for example an electronic camera, scanner or microphone, may be connected to an input of the data acquisition device. Alternatively the data acquisition unit may be integrated within a data capture or data storage device.

[0027] According to a fourth aspect of the present invention there is provided a data store arranged to store a plurality of data tags associated with respective data items, and arranged to export, upon request, a data tag generator configuration information for use by data tag generators.

[0028] Preferably the data store is arranged to store the respective data items associated with the data tags. The data store may additionally comprise a data tag generator for generating data tags associated with data items input to the store.

[0029] Additionally the data store may comprise a search engine arranged to locate data tags conforming to a user search request and cause the data items associated with the located data tags to be output from the data store.

[0030] Preferably the data store is connected to one or more other data stores. Additionally the data store may further comprise an evaluation module that is arranged to determine the processing power required to generate a data tag for a data item, compare the required processing power to the processing power available at the data tag generator, and in response to the comparison either enable the data tag generator to generate the data tag or enable the transmission of the data item to a data store.

[0031] According to a fifth aspect of the present invention, there is provided a computer program product for causing a data processor to execute the method according to the second aspect of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0032] The present invention will now be described, by way of example, with reference to the accompanying drawings, in which:

[0033]FIG. 1 shows a schematic representation of a data processing system according to an embodiment of the present invention connected to a number of data input devices: and

[0034]FIG. 2 shows a further embodiment of the present invention having a multi-layer structure.

DETAILED DESCRIPTION OF THE INVENTION

[0035]FIG. 1 shows a data acquisition device or unit 2 connected to a data store 4. The data acquisition unit 2 is connected to one or more data input devices. Examples of data input devices that are shown are a discrete data storage unit 6, for example a hard disk, a digital camera 8, and a document scanner 10. Other input devices such as video or sound recorders could also be provided. Located in the data acquisition unit 2 is a datatag generator 12 also known as a metadata generator. The metadata generator is arranged to process data items input from one or more of the data input devices to generate datatags or metadata for each data item. A data store 4 is connected to the data acquisition unit 2. The data store unit includes one or more data storage devices 14, such as known hard disk drives. Connected to the data storage devices 14 is a data query and/or indexing unit that is arranged to perform conventional data searching procedures. The data storage devices 14 are arranged to store either a plurality of data tags, a plurality of individual data items, or both data items and their associated datatags. The data acquisition unit 2 and data store 4 are connected by any suitable data transmission channel, for example by fibre optic cable, or by wireless connections.

[0036] In use, data items will be input to the data acquisition unit 2 from one of the data input devices 6-10. On receipt of the data items the metadata generator 12 will perform data processing to generate metadata associated with the input data items. The metadata may then be transmitted from the data acquisition unit 2 to the data store 4, together with, for example, a request from the data acquisition unit for the data store 4 to provide further data items that have similar metadata associated with them. This method of operation has the advantage that metadata generation is performed locally at the data acquisition unit 2 and not at the data store 4, thus freeing resources at the data store 4 that may be applied more efficiently to searching the contents of the data store 4 for requested data items. Additionally, having generated the associated metadata at the data acquisition unit 2, both the metadata and the associated data item may be transmitted to the data store 4 to be added to the data items and metadata stored there. In this way a database that is stored at the data store 4 may be expanded and updated relying solely on metadata and data items provided by remote stations without utilising the central resources at the central data store.

[0037] In a further embodiment of the present invention the data acquisition unit 2 may include a decision unit 18 connected to the metadata generator 12. The function of the decision unit 18 is to perform an evaluation of whether generation of the metadata for an input data item would be better performed either locally at the data acquisition unit 2 or centrally at the store 4. In these embodiments, the data storage unit 4 also includes a metadata generator 20. The evaluation of where to perform the metadata processing may take into account a number of parameters, for example the size and complexity of the data item(s) and therefore the processing power required to perform the metadata processing, or the size of the data item(s) in comparison with the transmission abilities of the connection between the data acquisition unit 2 and the data store 4. The evaluation may also take into consideration any stipulated privacy requirements. For example, it may be stipulated by a user that the author or originator of a data item(s) is not included in the metadata. This will have an impact on the complexity of the generated metadata. As a further example, if the connection between the data acquisition unit 2 and the data store 4 is of limited capacity, the decision unit 18 may evaluate that it is more efficient to generate the metadata locally at the data acquisition unit 2 using the metadata generator 12 rather than attempt to transmit the relatively large amount of data down the restricted transmission capacity of the connection between the data acquisition unit 2 and the central store 4. Alternatively, an evaluation may be made that it is more efficient to transmit the data item(s) unprocessed to the data store 4 to be processed by the more powerful data generator processor 20 located at the data store 4.

[0038] In certain embodiments of the present invention the metadata generator 12 located at the data acquisition unit 2 is arranged to report any failures to generate metadata for data items. A failure to generate metadata for a data item may occur due to the metadata generator 12 located at the data acquisition unit 2 not being configured to with appropriate contextual data generate metadata for a particular kind of data item. Alternative or additional causes of failure may include running out of memory during generation of the metadata or insufficient processing resources being available. The available processing resources may vary depending on other tasks being performed by the data acquisition unit at any given time. When such a failure occurs, the decision unit 18 may either transmit the particular data item to the data store 4 in order for the metadata generator 20 located at the data store 4 to generate the metadata, request revised configuration information from the metadata generator 20 located at the data store 4 to enable the metadata generator unit 12 located at the data acquisition unit 2 to be reconfigured to enable metadata to be generated for the data item at the data acquisition unit 2, generate simplified metadata that requires less system resources or configuration information, or simply transmit to the data store 4 the metadata that had been successfully generated prior to the failure occurring.

[0039] It will be appreciated by those skilled in the art that the data acquisition unit 2 may be a dedicated processing unit or may be integrated as either hardware or software within a general personal computer or the like. In the latter case, the decision unit 18 may also take into account the demands on the computer's processor when evaluating whether to generate metadata at the data acquisition unit or not. It will be appreciated that metadata processing may be performed at the data acquisition unit 2 during what would otherwise be “idle” processing periods.

[0040]FIG. 2 shows an embodiment of the present invention utilising a multi-layered, hierarchical arrangement of data storage units. A number of “first layer” data stores 4 are provided. Each data store includes a metadata generator 20. Connected to each first layer data stores 4 are one or more data acquisition units 2. In FIG. 2 the data acquisition units 2 are schematically shown as being connected to different data input sources. The sources shown include a digital camera 30, a document source 32, audio source 34 and video source 36. Each data acquisition unit 2 also includes a metadata processor 12. The data acquisition units 2 are arranged to operate in the same manner as those described in FIG. 1.

[0041] Each of the first layer data stores 4 is connected to a second layer data store 38. Preferably, but not necessarily, the second layer data store 38 has increased storage capacity in comparison to the first layer data stores 4. Although only two layers of data stores are shown in FIG. 2, it will be appreciated that any number of layers can be used. In use, the decision making process that occurs at the data acquisition units 2, as described with reference to FIG. 1, also takes place at each of the levels of data stores. Thus the system is flexible enough to perform the metadata processing at whichever layer is deemed most appropriate.

[0042] In further embodiments, the data input sources, for example the digital camera 30 shown in FIG. 2, may themselves include a metadata generator. This allows a further sub-layer of metadata processing and decision making to be performed.

[0043] By providing metadata generators at the various different layers of the system the processing is distributed throughout the system. This provides the advantage that both processing power throughout the system and the use of transmission connections can be optimised. The processing power of the metadata generators at the different layers may either be identical, or may increase towards the highest layer. In the latter case, only the more complex or large data items would need to be processed by the more powerful metadata generators, with the simpler data items being processed at lower levels by individual metadata generators.

[0044] In the same manner as described with relation to FIG. 1, the lower level metadata generators may be “updated” with new configuration information provided by metadata generators in the higher levels. 

1. A data processing system comprising at least one data item acquisition unit and at least one data store, the data item acquisition unit comprising a data tag generator for generating a data tag associated with each data item, and the data item acquisition unit being arranged to transmit at least the data tag to the data store.
 2. A data processing system according to claim 1, wherein said data item acquisition unit is further arranged to transmit the data item to the at least one data store.
 3. A data processing system according to claim 2, wherein the at least one data store comprises a further data tag generator.
 4. A data processing system according to claim 3, wherein said data item acquisition unit comprises a decision module arranged to evaluate if the data tag should be generated at the data acquisition unit or if the data item should be sent to the data store and the data tag generated there.
 5. A data processing system according to claim 4, wherein said decision module is arranged to utilise data item size and complexity information to generate an estimate of the processing power required to generate an associated data tag and determine if said estimated processing power exceeds the processing power available.
 6. A data processing system according to claim 5, wherein said processing power available at said data item acquisition unit is variable.
 7. A data processing system according to claim 5, wherein said decision module is further arranged to evaluate the efficiency of transmitting the data item to the data store for data processing based on the size of the data item and the quality and/or speed of the connection between the data acquisition unit and the data store.
 8. A data processing system according to claim 5, wherein said decision module is further arranged to include one or more restrictions on data to be included in said data tag in said evaluation.
 9. A data processing system according to claim 1, wherein said data tag generator is arranged to detect a failure to generate a data tag for a data item.
 10. A data processing system according to claim 9, wherein in response to said failure the data item is transmitted to the data store with a request for the data tag to be generated therein.
 11. A data processing system according to claim 9, wherein in response to said failure any portion of said data tag generated prior to said failure is transmitted to said data store.
 12. A data processing system according to claim 9, wherein in response to said failure said data tag generator attempts to generate a simplified data tag.
 13. A data processing system according to claim 9, wherein further configuration information held by the data tag generator at the data store is transmitted to the data acquisition unit to allow successful data tag generation to occur.
 14. A data processing system according to claim 1, wherein said data tag generator is arranged such that data tag configuration information is defined in relation to a presented data item and the data tag configuration information is transmitted to the data store for inclusion in the data tag generator located therein.
 15. A data processing system according to claim 1, wherein said data store is connected to one or more other data stores so as to provide a hierarchical arrangement of data stores.
 16. A method of processing data including the step of generating a data tag associated with a data item at a data acquisition unit and transmitting at least said data tag to a data store.
 17. A method of processing data according to claim 16, wherein the data item is also transmitted to the data store.
 18. A method of processing data according to claim 17, wherein said data tag generation can also occur at the data store.
 19. A method of processing data according to claim 18, wherein said method includes an evaluation step for evaluating if it is appropriate to generate the data tags at the or each acquisition unit or to transmit the data item to a data store for data tag generation to occur there.
 20. A method of processing data according to claim 19, wherein the evaluation step includes determining the size and complexity of the data item to be processed, providing an estimate of the processing power required to generate the associated data tag in response to the determination, comparing the estimated processing power with the processing power of the available data tag generator, and in response to the comparison either generating the data tag locally or transmitting the data item to a data store.
 21. A method of processing data according to claim 19, wherein said evaluation step includes evaluating the efficiency of transmitting the data item to the data store for data processing based on the size of the data item and the quality and/or speed of the connection between the data acquisition unit and data store.
 22. A method of processing data according to claim 16, wherein the method further includes transmitting configuration information from a data store to a data acquisition device and configuring a data tag generator located at the data acquisition device in accordance with the configuration information.
 23. A method of processing data according to claim 16, wherein data tag generation configuration information is transmitted from a data item acquisition device to a data store, said configuration information being defined in relation to a user presented data item.
 24. A method of processing data according to claim 19, wherein said data store is connected to at least one further data store and said evaluation step is performed at least one of said data stores.
 25. A data item acquisition device comprising a data tag generator and arranged to transmit a generated data tag associated with an acquired data item to a data store.
 26. A data item acquisition device according to claim 25 further comprising an evaluation module arranged to perform the evaluation procedure of claim
 19. 27. A data item acquisition device according to claim 25, wherein said data item acquisition device forms an integral part of a personal computing apparatus.
 28. A data item acquisition device according to claim 25, wherein one or more data capture devices selected from a list including an electronic camera, scanner and microphone is connected to an input of the data acquisition device.
 29. A data item acquisition device according to claim 25, wherein the data acquisition unit is integrated within a data capture or data storage device.
 30. A data store arranged to store a plurality of data tags associated with respective data items, and arranged to export, upon request, data tag generator configuration information for use by data tag generators.
 31. A data store according to claim 30, wherein said data store is arranged to store the respective data items associated with the data tags.
 32. A data store according to claim 30, wherein said data store comprises a data tag generator for generating data tags associated with data items input to the store.
 33. A data store according to claim 30, wherein said data store comprises a search engine arranged to locate data tags conforming to a user search request and cause the data items associated with the located data tags to be output from the data store.
 34. A data store according to claim 30, wherein said data store is connected to one or more other data stores.
 35. A data store according to claim 30, wherein said data store further comprises an evaluation module that is arranged to determine the processing power required to generate a data tag for a data item, compare the required processing power to the processing power available at the data tag generator, and in response to the comparison either enable the data tag generator to generate the data tag or enable the transmission of the data item to a data store.
 36. A computer program product arranged to cause a data processor to execute the method according to claim
 16. 37. A distributed metadata processing system comprising at least one data capture device and at least one remote data store in communication with the data capture device, the data capture device comprising metadata generation means for generating metadata associated with a captured data item, the data capture unit further comprising communication means for sending the metadata to the at least one remote data store.
 38. A data processing system comprising at least one data item acquisition unit and at least one remote data store, and in which both the data item acquisition unit and the remote data store include a metadata generator for generating metadata associated with acquired data, the data item acquisition unit further including a decision module arranged to evaluate if the metadata is most efficiently generated at the data acquisition unit or at the remote data store, the data acquisition unit being arranged to transmit at least the either the generated metadata or the data item to the remote data store in response to said evaluation.
 39. A data processing system comprising at least one data item acquisition unit and at least one remote data store, the data item acquisition unit comprising a data tag generator for generating a data tag associated with each data item, and the data item acquisition unit being arranged to transmit at least the data tag to the remote data store, wherein said data tag generator is arranged to detect a failure to generate a data tag for a data item and in response to said failure to request the transmission of further configuration information held at the remote data store to allow successful data tag generation to occur.
 40. A method of generating a data tag associated with a data item, the method including the steps of: acquiring the data item using a data capture unit; evaluating whether to generate the data tag at the data capture unit or at a data store in communication with the data capture unit; and in response to the evaluation, communicating either the data item or generated data tag to the data store.
 41. A method of generating metadata associated with a data item, the method including: acquiring a data item at a data acquisition unit having a metadata generator; determining if the metadata generator has sufficient configuration information to successfully generate the metadata for the data item; and if the outcome of said determination is negative, requesting the transmission of further configuration information from a data store in communication with the data acquisition unit. 